Deep Triphone Embedding Improves Phoneme Recognition
نویسندگان
چکیده
In this paper, we present a novel Deep Triphone Embedding (DTE) representation derived from Deep Neural Network (DNN) to encapsulate the discriminative information present in the adjoining speech frames. DTEs are generated using a four hidden layer DNN with 3000 nodes in each hidden layer at the first-stage. This DNN is trained with the tied-triphone classification accuracy as an optimization criterion. Thereafter, we retain the activation vectors (3000) of the last hidden layer, for each speech MFCC frame, and perform dimension reduction to further obtain a 300 dimensional representation, which we termed as DTE. DTEs along with MFCC features are fed into a second-stage four hidden layer DNN, which is subsequently trained for the task of tied-triphone classification. Both DNNs are trained using tri-phone labels generated from a tied-state triphone HMM-GMM system, by performing a forced-alignment between the transcriptions and MFCC feature frames. We conduct the experiments on publicly available TED-LIUM speech corpus. The results show that the proposed DTE method provides an improvement of absolute 2.11% in phoneme recognition, when compared with a competitive hybrid tied-state triphone HMM-DNN system.
منابع مشابه
Deep-hidden Conditional Neural Fields for Continuous Phoneme Speech Recognition
We have proposed Hidden Conditional Neural Fields (HCNF) for automatic speech recognition and shown the effectiveness by continuous phoneme recognition experiments on the TIMIT and the Japanese ASJ+JNAS corpora. In this paper, we propose to use an observation function with a deep structure in HCNF. The proposed deep observation function enables to use the deep neural networks in HCNF, which hav...
متن کاملDistinct triphone acoustic modeling using deep neural networks
To strike a balance between robust parameter estimation and detailed modeling, most automatic speech recognition systems are built using tied-state continuous density hidden Markov models (CDHMM). Consequently, states that are tied together in a tied-state are not distinguishable, introducing quantization errors inevitably. It has been shown that it is possible to model (almost) all distinct tr...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملHidden Conditional Neural Fields for Continuous Phoneme Speech Recognition
In this paper, we propose Hidden Conditional Neural Fields (HCNF) for continuous phoneme speech recognition, which are a combination of Hidden Conditional Random Fields (HCRF) and a MultiLayer Perceptron (MLP), and inherit their merits, namely, the discriminative property for sequences from HCRF and the ability to extract non-linear features from an MLP. HCNF can incorporate many types of featu...
متن کاملSpeech Recognition Using Monophone and Triphone Based Continuous Density Hidden Markov Models
Speech Recognition is a process of transcribing speech to text. Phoneme based modeling is used where in each phoneme is represented by Continuous Density Hidden Markov Model. Mel Frequency Cepstral Coefficients (MFCC) are extracted from speech signal, delta and double-delta features representing the temporal rate of change of features are added which considerably improves the recognition accura...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1710.07868 شماره
صفحات -
تاریخ انتشار 2017